Learning Distance Functions: Algorithms and Applications

نویسندگان

  • Tomer Hertz
  • Daphna Weinshall
چکیده

This thesis presents research in the field of distance learning. Distance functions are extensively used in various application domains and also serve as an important building block in many types of algorithms. Despite their abundance, until recently only canonical distance functions such as the Euclidean distance have been used, or alternatively various application specific distance functions have been suggested, which in most cases were handdesigned to incorporate domain specific knowledge. In the last several years there has been a growing body of work on algorithms for learning distance functions. A considerable amount of different distance learning algorithms have been suggested, most of which aim at learning a restricted form of distance functions called Mahalanobis metrics. In this thesis I will present three novel distance learning algorithms: 1. Relevant Component Analysis (RCA) An algorithm for learning a Mahalanobis metric using positive equivalence constraints. 2. DistBoost A boosting based algorithm which can learn highly non-linear distance functions using equivalence constraints. 3. KernelBoost A variant of the DistBoost algorithm which learns Kernel functions, which can be used in any kernel-based classifier. I will then describe their applications to various data domains, which include clustering, image-retrieval, computational immunology, auditory data analysis and kernel-based classification. In all of these application domains, significant improvement is made when using a learned distance function instead of a standard off-the-shelf distance function. These results demonstrate the importance of this growing research field. The first two chapters of this work present a general introduction to the field of distance functions, and distance function learning, with some additional background on semi-supervised learning: Chapter 1 Introduction: In Chapter 1 we provide a general introduction to distance functions, and some reasons why the distance learning problem is an important and interesting learning scenario. We then provide a detailed overview of canonical and hand-designed distance functions. The algorithms presented in this thesis are all from the field of semi-supervised learning. We therefore present a short introduction to the field of semi-supervised learning, with a specific focus on learning using equivalence constraints, which is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

Sample size determination for logistic regression

The problem of sample size estimation is important in medical applications, especially in cases of expensive measurements of immune biomarkers. This paper describes the problem of logistic regression analysis with the sample size determination algorithms, namely the methods of univariate statistics, logistics regression, cross-validation and Bayesian inference. The authors, treating the regr...

متن کامل

EMCSO: An Elitist Multi-Objective Cat Swarm Optimization

This paper introduces a novel multi-objective evolutionary algorithm based on cat swarm optimizationalgorithm (EMCSO) and its application to solve a multi-objective knapsack problem. The multi-objective optimizers try to find the closest solutions to true Pareto front (POF) where it will be achieved by finding the less-crowded non-dominated solutions. The proposed method applies cat swarm optim...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006